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INFORMATIVE  QUANTILE  FUNCTIONS  AND 
IDENTIFICATION  OF  PROBABILITY  DISTRIBUTION  TYPES 


by  Emanuel  Parzen 

Department  of  Statistics 
Texas  A&M  University 


Abstract 


^A  problem  of  great  importance  to  statistical  data 
analysts  is  quick  identification  of  possible  probability 
distributions  for  observed  data,  and  classification  of  tail 
behavior  of  probability  distributions.  This  paper  discusses 
the  informative  quantile  function  IQ(u)  «  {Q(u)  -  Q(0.5)}  * 

2 { Q CO .75)  -  Q(0.25)},  and  its  use  to  identify  probability  models 
for  observed  data  and  its  use  to  provide  concepts  of 
)~ — '’representative  distributions'Tlwhich  illustrate  the  different 
types  of  shapes  and  tail  behavior  that  real  distributions  can 
have.  This  paper  also  discusses  estimators  of  tail  exponents; 
they  can  be  used  to  identify  outlying  data  values ,  and  more 
centrally  to  identify  possible  distributions  to  fit  to  data. 
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0.  Prologue:  keys,  two-keys,  and  statistical  signals 

This  paper  introduces  the  informative  quantile  function; 
its  definition  is  probability  based,  its  properties  can  be 
studied  both  mathematically  and  empirically,  and  it  provides 
unified  definitions  and  practical  estimators  of  the  tail  types 
of  probability  distributions  that  can  fit  an  observed  batch  of 
data.  Illustrative  tables  of  tail  values  of  informative 
quantile  functions  of  familiar  distributions  are  given;  they 
provide  new  types  of  keys  (and  two-keys)  for  exploratory 
data  analysis  of  a  (random)  sample  (of  a  random  variable) . 

A  key  for  exploratory  data  analysis  is  defined  to  be  a 
method  of  data  detection  by  which  researchers  can  familiarize 
ourselves  "with  the  data,  get  a  rough  idea  of  potential 
problems,  and  look  for  both  obvious  and  subtle  clues  about 
the  process  that  generated  the  data  and  the  process  that 
processed  the  data  before  we  got  to  see  it"  [Welsch  commenting 
on  Parzen  (1979)).  When  a  key  is  based  on  concepts  of 
probability  theory  (and  thus  ultimately  also  provides  methods 
of  data  inference  and  confirmatory  data  anlaysis) ,  we  call  it 
a  two-key . 

Keys  which  are  also  two-keys  provide  statistical  signals. 
One  important  role  of  numerical  statistical  signals  is  to  be 
appended  to  statistical  graphics  to  help  guide  the  Viewer's 
attention  to  the  graphical  statistical  signals  (significant 
features  of  the  graphs) .  In  support  of  the  proposition  that  the 
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best  keys  are  two-keys,  we  conclude  with  a  statement  by 
W.  E.  Deming  entitled  "Statistical  Work  and  Computers."  (We 
do  not  know  where  it  was  published,  and  believe  it  to  have  been 
written  in  the  early  1970's). 

The  feature  that  distinguishes  the  statistician 
from  other  professions  is  his  use  of  the  theory  of 
probability.  The  statistician  requires  knowledge 
of  statistical  theory.  To  fulfill  his  duties  in 
professional  practice,  he  must  distinguish  between 
knowledge  and  wisdom.  He  is  a  scientist,  but  also 
an  artist.  He  requires  wisdom  to  make  a  good  choice 
of  problem  and  a  choice  of  statistical  procedure 
that  will  be  valid  and  feasible  under  the 
circumstances . 

The  computer  can  be  the  statistician's  servant, 
though  many  people  are  content  if  it  is  the  other 
way  around.  Many  firms  today  have  magnificent 
information  systems,  but  too  often  these  systems 
fail  to  present  information  as  wisdom.  The 
statistician,  in  his  aim  to  find  causes  of  variation 
in  product  (synonymous  with  poor  quality  and  high 
costs) ,  may  use  data  from  an  information  system, 
but  he  adapts  the  system  to  calculate  statistical 
signals.  It  is  more  important  to  have  a  system  to 
improve  performance  than  to  have  a  system  that 
merely  tells  us  where  we  are  now.  The  statistician 
transforms  information  into  a  living  force  for  the 
advancement  of  knowledge  and  for  improvement  of 
quality  and  output,  industrial  and  agricultural. 
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1 .  Quantile  and  sample  quantile  functions 

Various  aspects  of  the  probability  distribution  of  a 
random  variable  X  are  described  by  its : 


distribution  function 
probability  density 
quantile  function 
quantile  density  function 
density-quantile  function 


F(x)  =  Pr[X<x]  ,  -°°<x<“>  ; 

f  (x)  =  F'(x),  -  oo<x<°°  ; 

Q(u)  =  F-1(u) ,  0<u<l  ; 

q(u)  =  Q'(u),  0<U£l  ; 


fQ(u)  «  fF”1 (u) )  =  (q(u)}"1, 


score  function 


0£U<1 

J(u)  =  -(fQ)’(u)  ,  0<u<l 


Let  X^,  Xj, . . . ,Xn  be  a  data  set.  The  keys  we  propose,  to 
gain  insight  into  the  processes  generating  the  data,  become  two- 
keys  when  we  assume  that  the  data  batch  is  a  random  sample  of  a 
random  variable  X.  The  sample  distribution  function  F(x)  and 
sample  quantile  function  Q(u)  are  defined  in  terms  of  the  order 
statistics  Xln±X2ni  •••  ^Xnn  of  the  sample: 

F(x>  -  ^  .  XJn  i  x  <  X(j+1)n  ; 

«<“>  *  V  <u  ^  i 


A 


which  is  piecewise  linear  between  the  values 

^H+T*  "  Xjn  ’  J"1 . n‘ 

For  graphical  data  analysis,  we  transform  Q(u)  to  a 
normalized  version  IQ(u),  called  the  sample  informative 
quantile  function.  The  value  of  IQ(u) ,  as  u  tends  to  0  and  1, 
provide  diagnostic  measures  of  the  type  of  probability 
distribution.  An  important  classification  of  "type"  is  in 
terms  of  tail  exponents . 
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Tail  Exponents  Classification  of  Probability  Laws 

From  extreme  value  theory,  statisticians  have  long  realized 
that  it  is  useful  to  classify  distributions  according  to  their 
tail  behavior  (behavior  of  F(x)  as  x  tends  to  +  °°).  It  is  usual 
to  distinguish  three  main  types  of  distributions,  called  (1) 
limited,  (2)  exponential,  and  (3)  algebraic.  This  classification 
can  also  be  expressed  in  terms  of  the  density  quantile  function 
fQ(u);  we  call  the  types  short,  medium,  and  long  tail. 

A  reasonable  assumption  about  the  distributions  that  occur 
in  practice  is  that  their  density- quantile  functions  are 

regularly  varying  in  the  sense  that  there  exist  tail  exponents 
ctQ  and  such  that,  as  u->-0 , 

fQ(u)  ■  u  0  Lq(u)  *  fQ(l-u)  *  u  1  L1(u) 

where  L.  (u)  for  j=0,l  is  a  slowly  varying  function. 

A  function  L(u) ,  0<u<l  is  usually  defined  to  be  slowly 
varying  as  u->-0  if,  for  every  y  in  0<y.<l,  L(yu)/L(u)  +  1  or 
log  L(yu)  -  log  L(u)  -*■  0  .  For  estimation  of  tail  exponents 
we  will  require  further  that,  as  u-*-0, 

/*  (log  L(yu)  -  log  L(u) }  dy  -*•  0 

which  we  call  integrally  slowly  varying.  An  example  of  a  slowly 

- 1  8 

varying  function  is  L(u)  ■  {log  u  }  ;  this  is  proved  in  section  9 


Classification  of  tail  behavior  of  probability  laws 
A  probability  law  has  a  left  tail  type  and  a  r'ght  tail 
type  depending  on  the  value  of  o(q  and  a^.  If  a  is  the  tail 
exponent,  we  define: 


a 

< 

0 

super  short  tail 

0  £  a 

< 

1 

short  tail 

a 

1 

medium  tail 

a 

> 

1 

long  tail 

Medium  tailed  distributions  are  further  classified  by  the  value 
of  J*  =  lim  J (u) : 


a  =  1 

,  j* 

=  0 

medium  long  tail 

a  =  1 

,  0  < 

J*  < 

»  medium-medium  tail 

a  -  1 

.  J* 

«  oo 

medium- short  tail 

One  immediate  insight  into  the  meaning  of  tail  behavior  is 
provided  by  the  hazard  function 

h(x)  =  f(x)  r  { 1-F (x) } 


with  hazard  quantile  function  hQ(u)  =  fQ(u)  i  1-u.  The  convergence 
behavior  of  h(x)  as  x-*-«  is  the  same  as  that  of  hQ(u)  as  u-*l. 

From  the  definitions  one  sees  that  h*  -  lim  h(x)  satisfies 
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h*  *  00 

0<h*<°° 

h*  =  0 


(increasing  hazard  rate)  Short  or  medium-short 
tail 


(constant  hazard  rate)  Medium-medium  tail 


(decreasing  hazard  rate)  Long  or  medium- long 
tail 
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3 .  Unitized  and  Informative  Quantile  Functions 

If  one  can  define  "universal"  location  and  scale 
parameters,  denoted  and  respectively,  then  one  can  define 
a  normalization  of  the  quantile  function  which  depends  only 
on  its  shape  (and  is  independent  of  location  and  scale)  by 


Qx(u) 


Q(u)  - 


We  propose 


=  Q(0 . 5) ,  oL  =  Q’ (0.5)  =  q  (0 . 5) 

We  call  (u)  the  unitized  quantile  function . 

One  can  distinguish  three  kinds  of  estimators  of  parameters 
[such  as  y^  and  a^] :  fully  non-parametric  [denoted  y^  and 

—  A  A 

o^],  fully  parametric  [denoted  y^  and  o,)i  and  functional 
[estimators  y^  and  which  are  the  parameters  of  smoothed 
quantile  functions  Q(u)  obtained  by  smoothing  the  raw  or  fully 
non-parametric  estimator  Q(u)].  The  shape  of  Q(u)  must  be 
inferred  before  one  can  efficiently  estimate  y  and  a  using  fully 
parametric  (or  robust  parametric)  estimators. 

A  fully  non-parametric  estimator  of  Q(0.5)  is  Q(0.5).  A 
fully  non-parametric  estimator  of  q(0.5)  is  more  difficult  to 
define.  We  therefore  consider  quick  and  dirty  approximators  of 
q(0.5)  of  the  form 


where  0<p<0.5.  We  usually  take  p  =  0.25;  then  we  approximate 
q(0 . 5)  by 


°0. 25 


=  2 {Q(0 . 75)  -  Q(0 . 25)  } 


We  call 


IQ(u)  =  2?Q(ff775?(-Q(0.25) T 

the  informative  quantile  function. 

We  compute  IQ(u) ,  but  graphically  we  plot  the  truncated 
informative  quantile  function 

TIQ(u)  -  -1  if  IQ(u)  <  -1, 

*=  1  if  IQ(u)  >  1, 

*  IQ(u)  if  | IQ(u) |  <  1. 

In  addition  to  the  plot  of  TIQ(u),  we  report  the  values  of  IQ(u) 
at  u-0.01,  0.05,  0.10,  0.25,  0.75,  0.90,  0.95,  0.99.  Truncating 
the  values  of  IQ(u)  in  our  plot  enables  us  to  see  the  "middle" 
of  the  distribution.  The  ends  (tails)  of  the  distributions  are 
described  numerically  by  the  extreme  values  of  IQ(u) . 
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For  convenience  in  seeing  at  a  glance  in  a  plot  of  IQ(u) 
its  behavior,  especially  as  u  tends  to  0  and  1,  we  plot  on  the 
same  graph  the  IQ(u)  of  a  uniform  distribution  (it  is  a  straight 
line  with  values  -0.5  and  0.5  at  u  ■  0  and  1  respectively). 

Example:  Super  Short  Distributions.  An  imporant  example 

of  a  super-short  distribution  (a<0)  is  X  =  -cos  ttU  where  U  is 
uniform  [0,1].  Since  -cos  iru  is  an  increasing  function  of  u, 
the  quantile  function  of  X  is  Q(u)  =  -cos  ttu,  with  quantile 
density  and  density-quantile 


q(u) 


sin  TTu 

IT 


fQ(u) 


7f 

sm  iru 


As  u-*-0,  fQ(u)^  u'1  so  c*q  =  -1.  The  distribution  is  symmetric, 
in  the  sense  that  q(l-u)  =  q(u) ;  therefore  =  -1.  The 
interquartile  range  IQR  =  /7  ;  the  informative  quantile  function 
is  IQ(u)  =  (-.35)  cos  uu.  Therefore  IQ(0)  =  -.35,  IQ(1)  =  .35. 
These  values  are  taken  as  typical  values  of  super-short 
distributions . 
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4.  Examples  of  theoretical  informative  quantile  functions 

A  normal  distribution  is  defined  in  terms  of  the  standard 
normal  density  $(x)  and  distribution  ♦(x), 

<J>(x)  =  exp  -  i  x2  ,  4>(x)  =  /°°<J>(y)dy; 

/ZtT  -OO 

a  distribution  F(x)  is  called  normal  when  it  can  be  represented 

v 

F(x)  =  (^)  ,  f(x)  -  i  (^) 

with  quantile  function 

Q(u)  =  y  +  o  $  ^ (u) . 

The  parameters  y^  and  are  related  to  y  and  o  by  y^  *  y  and 
=  o/Zir  .  The  unitized  normal  density  (for  which  ■  1)  has 
density 


fx(x)  =  /Z?  (x  /I?)  *  e-7rx 

which  is  Stigler's  proposal  for  a  standardized  normal  density 
[Stigler  (1982)]. 

An  exponential  distribution  has  density 
f(x)  ”  £  fofy  •  fo(x)  "  e’X  ’  x  1  0 
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and  quantile  function 

Q(u)  =  log  (1-u)"1 

Although  its  mean  equals  a,  we  regard  o  as  a  scale  parameter 
rather  than  a  location  parameter.  The  parameters  y^,  a^,  and 

°o  25  satisfy 


y^  *  o  log  2  =  (.69)  a;  a^  =  2a  ;  Oq  25  *  2.2a 

The  unitized  and  informative  exponential  quantile  functions  are 

Qx(u)  =  -0.5  log  2 (1-u) 

IQ(u)  «  -0.45  log  2 (1-u) 

The  possible  shapes  of  informative  quantile  functions  are 
best  described  by  plots  of  the  Weibull  distribution  with 
parameter  B,  which  has  standard  quantile  function 

Q(u)  -  (log  (1-u r1}3 

Graphs  of  the  information  quantile  functions  of  the  Weibull 
distribution  for  B  ”  .1  (.1)  2.0  are  given  in  the  appendix. 
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5.  Outlying  data  value  Interpretation  of  IQ(u) 

The  sample  Informative  quantile  function  Is  defined  by 

IQ(u)  -  (Q(u)  -  Q(0. 5) }  *  2  IQR 

where  IQR  is  the  sample  interquartile  range:  IQR  -  Q(0.75)  - 
Q(0.25).  The  truncated  sample  informative  quantile  function 
TIQ(u)  is  defined  to  be  IQ(u)  truncated  at  +1. 

Hoaglin,  Mosteller,  and  Tukey  (1983,  p.  39)  introduce 
techniques  for  identifying  outlying  (or  outside)  data  values 
as  those  lying  outside  the  interval 

(Q(0 . 25)  -  (1.5)  IQR,  Q(0 . 75)  +  (1.5)  IQR) 

We  regard  as  outlying  data  values  those  lying  outside  the  interval 

(Q(0 . 5)  -  2IQR,  Q(0. 5)  +  2  IQR) 

Outlying  data  values  appear  on  the  plot  of  TIQ(u)  as  values 
truncated  to  +1.  The  actual  values  of  outlying  data  values  are 
represented  by  the  values  of  IQ(u)  for  u-0.01,  0.05,  0.10, 

0.90,  0.95,  0.99.  The  next  section  discusses  how  these  quantities 
provide  quick  and  dirty  estimators  of  the  tail  type  of  the 
distributions  that  can  fit  the  sample. 
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Other  useful  numerical  diagnostics  are  estimators  of  the 
IQ-mean  wIQ  and  IQ- standard- deviation  oIQ,  defined  by 


ylQ 


^25 


2 

where  u  and  a  are  the  mean  and  variance  of  Q(u) .  The  logarithm 
(to  the  base  e)  of  oID  is  denoted  log  SDIQ.  For  a  normal 
distribution  a  ID  *  1/27  and  log  SDIQ  -  -1  approximately.  A 
test  that  the  sample  has  a  Gaussian  distribution  can  be  based 
on  testing  if  the  sample  estimator  of  log  SDIQ  is  significantly 
different  from  -1. 


15 


6.  Tables  of  tall  values  of  Informative  quantile  function* 

One  use  of  the  informative  quantile  function  IQ(u)  of  a 
sample  is  to  determine  quickly  probability  distribution  that 
might  fit  the  sample.  One  can  readily  distinguish  whether  the 
data  could  be  fit  by  a  normal  distribution  or  an  exponential 
distribution  [and  thus  determine  the  "probability  of  success" 
if  one  were  to  apply  a  more  formal  goodness  of  fit  test]. 

However  no  standard  parametric  model  may  fit  the  data,  and 
statistical  data  analysis  must  identify  significant  features 
of  the  data  "non-parametrically" . 

Statistical  scientists  are  seeking  to  define  concepts  which 
illustrate  the  different  types  of  shapes  and  tail  behavior  that 
real  distributions  can  have.  Hoaglin,  Mosteller,  and  Tukey 
(1983,  p.  316)  use  language  such  as  "neutral  tailed  (Gaussian)" 
and  stretch- tailed  (Cauchy)".  To  describe  the  notion  of  tail 
weight,  they  write  that  it  "expresses  how  the  extreme  portion 
of  the  distribution  spreads  out  relative  to  the  width  of  the 
center."  As  an  index  of  tail  behavior,  they  introduce  (p.  323) 

{Q (0 . 9)  -  Q(0 . 1) }  *  {Q(0. 75)  -  Q(0.25)}  -  2{IQ(0.9)  -  IQ(O.l)}  . 

As  indices  of  tail  behavior,  this  paper  proposes  IQ(u) 
at  u  -  0.01,  0.05,  0.1,  0.9,  0.95,  0.99.  The  true  values  of 
these  indices  for  various  familiar  distributions  are  given  in 
the  tables.  These  indices  are  keys  (useful  for  exploratory 


\ 
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data  analysis  of  what's  unusual  or  extraordinary  about  a  data 
set)  and  two-keys  (provide  estimates  of  the  tail  exponents 
and  tail  types  of  distributions  that  might  have  generated  the 
data) . 
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Table  6A 

Tail  Values  of  Informative  Quantile  Function  IQ(u) 
Standard  Distributions 

*  *=  Approximate  value  of  u  at  which  IQ(u)  ”  1. 


Distribution 

★ 

u  .01 

.05 

.10 

.90 

.95 

.99 

Normal 

_  _ 

-.862 

-.610 

-.475 

.475 

.610 

.862 

Exponential 

.95 

-.311 

-.292 

-.268 

.732 

1.048 

1.780 

Logistic 

.99 

-1.046 

-.670 

-.500 

.500 

.670 

1.046 

Double  Exp 

.97 

-1.411 

-.830 

-.568 

.580 

.830 

1.411 

Cauchy 

.92 

-7.955 

-1.578 

-.769 

.769 

1.578 

7.954 

Extreme  Value 

-1.346 

-.828 

-.599 

.382 

.465 

0.602 

Log  Normal 

.91 

-.310 

-.278 

-.278 

.895 

1.438 

3.178 

Super  Short 

-- 

-.353 

-.349 

-.336 

.336 

.349 

0.353 
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Table  6B 

Tall  Values  of  Informative  Quantile  Function  IQ(u) 
Weibull  Q(u)  -  {log  (1-u)'1}6 


"k  m 

Approximate  value 

of  u  at 

which  IQ(u) 

-  1. 

B 

* 

u-  .01 

.05 

.10 

.90 

.95 

.99 

.1 

_  _ 

-1.107 

-.735 

-.550 

.409 

.505 

.668 

.2 

-.921 

-.655 

-.506 

.438 

.549 

.743 

.3 

-- 

-.777 

-.585 

-.466 

.468 

.595 

.826 

.4 

-- 

-.662 

-.525 

-.430 

.500 

.646 

.919 

.5 

1.0 

-.571 

-.473 

-.396 

.534 

.701 

1.024 

.6 

.98 

-.498 

-.427 

-.366 

.570 

.  760 

1.142 

.7 

.97 

-.437 

-.387 

-.338 

.607 

.824 

1.275 

.8 

.96 

-.388 

-.351 

-.312 

.647 

.893 

1.424 

.9 

.95 

-.346 

-.320 

-.295 

.689 

.967 

1.592 

1.0 

.94 

-.311 

-.292 

-.273 

.732 

1.048 

1.780 

1.1 

.93 

-.281 

-.267 

-.252 

.778 

1.135 

1.993 

1.2 

.93 

-.255 

-.245 

-.233 

.827 

1.229 

2.232 

1.3 

.92 

-.232 

-.225 

-.216 

.878 

1.331 

2.502 

1.4 

.91 

-.212 

-.207 

-.200 

.931 

1.440 

2.806 

1.5 

.90 

-.195 

-.191 

-.185 

.987 

1.559 

3.148 

1.6 

.89 

-.179 

-.177 

-.172 

1.046 

1.687 

3.54 

1.7 

.89 

-.165 

-.163 

-.159 

1.107 

1.825 

3.969 

1.8 

.88 

-.153 

-.151 

-.147 

1.172 

1.974 

4.459 

1.9 

.88 

-.141 

-.140 

-.137 

1.240 

2.135 

5.012 

2.0 

.87 

-.131 

-.130 

-.128 

1.311 

2.309 

5.635 

2.1 

.87 

-.121 

-.121 

-.119 

1.386 

2.497 

6.338 

2.2 

.86 

-.112 

-.112 

-.111 

1.464 

2.700 

7.130 

2.3 

.86 

-.104 

-.104 

-.103 

1.546 

2.919 

8.023 

2.4 

.85 

-.097 

-.097 

-.096 

1.633 

3.155 

9.031 

J 


Table  6C 

Tail  Values  of  Informative  Quantile  Function  IQ(u) 
Lognormal  Q(u)  *  exp  X*"^(u) 

Approximate  value  of  u  at  which  IQ(u)  -  1. 


500 

-.408 

-.344 

.653 

.928 

1.600 

310 

-,278 

-.246 

.895 

1.438 

3.178 

203 

-.192 

-.179 

1.223 

2.260 

6.655 

138 

-.134 

-.128 

1.666 

3.594 

14.449 

096 

-.094 

-.092 

2.266 

5.761 

32.083 

067 

-.067 

-.066 

3.077 

9.284 

72.169 

048 

-.047 

-.047 

4.175 

15.012 

163.511 

034 

-.034 

-.034 

5.661 

24.322 

371.888 

024 

-.024 

-.024 

7.673 

39.454 

847.538 

017 

-.017 

-.017 

10.398 

64.041 

-- 

012 

-.012 

-.012 

14.089 

103.988 

— 

009 

-.009 

-.009 

19.087 

168.886 

— 

006 

-.006 

-.006 

25.858 

2V4.315 

— 

004 

-.004 

-.004 

35.029 

445.586 

— 

.003 

-.003 

-.003 

47.452 

723.814 

— 

.002 

-.002 

-.002 

64.280 

-- 

— 
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7.  Example  of  sample  informative  quantile  analysis 

A  data  set  extensively  analyzed  at  Bell  Telephone 
Laboratories  (and  discussed  in  a  recent  book  on  graphical  methods 
of  data  analysis  by  Chambers,  Cleveland,  Kleiner,  and  Tukey, 
(1983))  consists  of  Stamford  Conn.  Monthly  Maximum  Ozone  levels. 
Sample  size  n-136,  sample  median  y^  -  80,  sample  mean  y  -  89.7, 
twice  interquartile  range  *  147.5,  and  standard  deviation 
o  -  52.1.  Rather  than  reporting  the  original  data  X^,...,Xn  we 
report  (table  7A)  the  normalized  values  (Xj-y^)  *  which  are 
used  to  plot  IQ(u) ;  a  plot  of  Q(u)  is  given  on  p.  15  of  Chambers 
et  al.  Numerical  statistical  signals  are  provided  by  the  tail 
values : 


u  0.05  .1  .90  .95 

IQ(u)  -.38  -.33  .61  .83 

By  consulting  the  table  of  Weibull  informative  quantile  values, 
as  a  first  guess  of  a  distribution  to  fit  this  data  one  takes 
Weibull  with  parameter  8-0.8.  The  graph  of  IQ(u)  in  Figure  7A 
also  suggests  to  us  that  a  Weibull  distribution  provides  a  good 
first  approximation.  How  to  refine  this  approximation  is  a 
problem  treated  by  our  ONESAM  data  analysis  program. 

An  alternate  approach  to  modeling  this  data  is  to  find  a 
transformation  to  normality;  one  would  then  report  as  one's 
conclusion  that  cube  root  of  Stamford  Ozone  data  is  normally 
distributed.  We  believe  that  this  conclusion  must  be  considered 
curve  fitting,  while  a  conclusion  that  the  data  is  fit  by  a 
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Weibull  distribution  with  8  in  a  specified  range  represents  a 
curve  fit  with  scientific  insight  (which  may  help  to  explain 
the  physical  mechanisms  generating  the  data) . 
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8.  Super- short  distributions  as  harbingers  of  bimodality 

When  the  sample  informative  quantile  function  indicates  a 
"super  short"  distribution  the  true  distribution  may  not  be  a 
super-short  unimodal  distribution,  but  a  bimodal  distribution. 

The  manner  in  which  a  super-short  distribution  may  be 
indicative  of  bimodality  is  indicated  by  the  two-sample  problem. 
One  has  a  sample  of  values  from  a  distribution  F(x),  and  a  sample 
of  values  from  a  distribution  G(x) .  When  the  samples  are  pooled, 
they  are  regarded  as  a  sample  from  a  distribution  H(x)  which  can 
be  represented  H(x)  «  X  F(x)  +  (1-X)  G(x)  where  X  is  the  fraction 
of  the  pooled  sample  from  F(x).  One  often  seeks  to  test  the 
hypothesis  HQ:  F(x)  *  G(x).  The  informative  quantile  plot  of 
H(x)  is  super- short  when  F  and  G  have  their  modes  far  apart* 


To  illustrate  the  ideas, 

assume  F(x) 

■  Kx) , 

G(x)  *  ■ 

Kx-6)  , 

H(x)  - 

0. 5{*(x) 

+  *(x-6)}.  A 

random  sample  from  H(x),  of 

size 

40  was 

simulated,  for  6*1, 

2.  3,  4,  5, 

6.  The 

observed 

values 

of  IQ(u) 

are  given  in  the  following  table. 

6 

u  .05 

.10 

.25 

.75 

.90 

.95 

1 

- .  6566 

- . 6069 

-.2110 

.2890 

.5005 

.6570 

2 

- . 4450 

-.3553 

-.2044 

.2956 

.5847 

.7258 

3 

-.4077 

-.2801 

- . 2034 

.2966 

.5012 

.6108 

4 

-.4586 

-.4260 

-.2908 

.2092 

.3326 

.4340 

5 

-.4350 

- .  3620 

- . 2649 

.2351 

.4079 

.4191 

6 

-.3228 

-.2915 

- . 1841 

.3159 

.3795 

.4179 

Other  summary  statistics  of  the  samples  were 
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Median 

Interquartile  Mean 

St.  Dev. 

Log 

Range 

IQ 

IQ 

SDIQ 

1 

.62 

1.46 

.01 

.3689 

-.997 

2 

1.10 

2.07 

.05 

.3347 

-1.095 

3 

.97 

2.85 

.05 

.3024 

-1.196 

4 

2.23 

3.96 

-.03 

.2846 

-1.257 

5 

2.36 

4.00 

.01 

.2900 

-1.238 

6 

2.39 

5.28 

.05 

.2669 

-1.321 

The  values 

of  IQ(0 .05) , 

IQC0.95) 

and  log  SDIQ 

in  the  case 

= 

1  indicate 

a  Gaussian  distribution.  The  values 

of  IQ(0.05) 

and  IQ (0.95)  in  the  cases  6  *  4,  5,  6  indicate  a  super- short 
distribution  which  leads  us  to  check  the  quantile  functions  of 
the  pooled  sample  for  the  possiblity  of  bimodality  which  often 
indicates  that  the  two  samples  do  not  have  the  same  distributions. 
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9.  Theoretical  and  empirical  formulas  for  computing  tail 
exponents 

The  properties  of  slowly  varying  functions  are  best  under¬ 
stood  by  considering  an  example. 

-1  a 

Lemma  L(u)  -  {log  u  }p  is  (integrally)  slowly  varying 
as  u+0. 

Proof:  log  L(yu)  -  6  log  log  (yu)”1  -  6  log  {log  y”1  +  log 
log  L(yu)  -  log  L(u)  -  6  log  {1  +  (log  y’^/log  u"1)} 
| log  L(yu)  -  log  L(u) |  <  6  | (log  y_1/log  u_1| 

Verify  that  /£  |log  y|  dy  <  ®  ,  and  1/log  u”*  -*-0  as  u  -►  0. 

One  can  conclude  that  L(u)  is  slowly  varying  and  also  integrally 
slowly  varying. 

The  representation  of  fQ(u)  suggests  a  formula  for 
computation  of  tail  exponents  <*q  and  (which  may  be  adapted  to 
provide  estimators  from  data) . 

Theorem:  Computation  of  tail  exponents 

-aQ  -  lim  /*  {log  fQ(yu)  -  log  fQ(u)}  dy 
u-*-0  ° 

Equivalently 


“a0 


lim  i  /P  log  fQ(t)  dt  -  log  fQ(p) 

p-*-0  F 


I 


Similarly 


a,  -  lim  /*  {log  fQ(l-yu)  -  log  fQ(l-u)}  dy 
Lu+0° 


lim  log  fQ(t)  dt  -  log  fQ(l-p) 

p-*l  v  v 


Proof:  log  fQ(u)  -  log  u  +  log  LQ(u), 

log  fQ(yu)  -  log  fQ(u)  -  aQ  log  y  +  log  LQ<yu)  -  log  LQ0 

Since  /p  log  y  dy  *  -1,  we  conclude  that 

/£  (log  fQ(yu)  -  log  fQ(u)  >  dy  -  -aQ  +  o(u) 

Similarly  one  derives  formula  for  a^. 

Because  the  density-quantile  and  quantile-density  functions 
are  reciprocals,  we  obtain  similar  formulas  for  q(u)  which  may 
be  easier  to  implement  in  practice: 

-«o 

q(u)  -  u  Lq  (u)  ,  as  u-*0 

q(u)  -  (1-u)  1  L^l-u),  as  u**l  ; 


a0  -  lim  J*  {log  q(yu)  -  log  q(u)}  dy  ; 
u-*-0 


“i  -  lim  /p  {log  q(l-yu)  -  log  q(l-u)}  dy. 
u-*-0 
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For  theoretical  purposes  it  is  often  convenient  to  compute 
tail  exponents  using  formulas  such  as 


a0 


lim  u  H5  lo« 

u+0 


-  lim  iH-i 
u-*>0  fQ  (u) 


ot-i  m  lim 
1  u-1 


(1-u)  log  fQ(u) 


■  lim 
u-*-l 


a-u)  j(u> 
fQ(u) 


In  practice,  we  would  estimate  tail  exponents  from  the 
values  of  fQ(t)  at  an  equispaced  grid  of  points  t=j/n, 

J*l,2 . n-1.  Let  k  and  n  tend  to  ®  in  such  a  way  that  k/n 

tends  to  0;  define 


-“0,k  ■  E  jlj  lo*  fQ(n>  -  log  fQ<TT>  . 

“l.k  ‘  E  jXk1”*  ‘  log  fQ(1'!V') 


Conjectures  to  be  proved  are  that 


°0 


lim 

k— 

k/n-*-0 


a 


0,k 
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a,  -  lim  a.  . 
1  k-»  L,K 

k/n-0 


The  rate  of  convergence  can  be  very  slow.  If  L(u)  - 
{log  u'1}*  ,  then 


“0  “  °0,k  +  c 


log  | 


1-1 


The  theoretical  properties  and  practical  implementation 
of  the  foregoing  estimators  remains  to  be  investigated. 
Related  estimators  are  given  in  Mason  (1982)  and  the  papers 
referenced  there. 
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APPENDIX 


Informative  Quantile  Functions  of  Weibull  Distributions  with 

Parameter  6: 

Q(u)  -  {logd-u)’1}* 


I 


INFORMATIVE  QUANTILE 


ltt  MWT1U  -  WCIBULL,  8  -  1.0  _|  WWWTIVE  OUWTILE  -  WE  I  BULL 


ouWTlie  -  WE  I  BULL.  8  ■  1.4  informative  ourntile  -  WE  I  BULL 


0.63 


~ WE  HULL",  T  -  1.8  itnmmvE  oumriLf  -  WE  I  BULL 


